File name and folder restructure #124

gabotechs · 2025-09-04T20:21:10Z

This PR provides a better organization for the project colocating into modules the different pieces based on the features they are providing.

The new folder structure, as it colocates things related to each other more efficiently, allows to tweak the privacy of some structs/methods so that they are not expose to outside unrelated modules, and therefore reducing the chances of future coupling.

All the code is mostly untouched, only a couple of code changes are shipped for making this possible:

The ExecutionTask now has two forms:
- ExecutionTask: struct with the Url already parsed and the partition groups already as usizes, so that we don't need to do juggling in the code for parsing Urls or casing u64s to usizes
- ExecutionTaskProto: the protobuf representation, decoupled from the actual useful struct
The test in the do_get method was using directly the proto representation of the ExecutionStage and the ExecutionTask, but they are no longer exposed to the outside, so instead the normal structs are used, which then are converted to protos with the proto_to_stage function. This actually results in a nice cleanup that saves some lines of code.

common/
    mod.rs
    composed_extension_codec.rs
    ttl_map.rs
errors/
    mod.rs
    arrow_error.rs
    datafusion_error.rs
    io_error.rs
    objectstore_error.rs
    parquet_error.rs
    parser_error.rs
    schema_error.rs
execution_plans/
    mod.rs
    arrow_flight_read.rs
    partition_isolator.rs
    stage.rs
flight_service/
    mod.rs
    do_get.rs
    service.rs
    session_builder.rs
protobuf/
    mod.rs
    distributed_codec.rs
    stage_proto.rs
    user_codec.rs
test_utils/
    mod.rs
    insta.rs
    localhost.rs
    mock_exec.rs
    parquet.rs
    tpch.rs
lib.rs
channel_resolver_ext.rs
config_extension_ext.rs
distributed_ext.rs
distributed_physical_optimizer_rule.rs

NGA-TRAN

Nice refactor and rename.

NGA-TRAN · 2025-09-05T13:36:19Z

src/distributed_physical_optimizer_rule.rs

 mod tests {
    use crate::assert_snapshot;
-    use crate::physical_optimizer::DistributedPhysicalOptimizerRule;
+    use crate::distributed_physical_optimizer_rule::DistributedPhysicalOptimizerRule;


I like you rename it to distributed_physical_optimizer_rule. Make it clearer

yeah, I've tried to rename files to be a bit more consistent with the content.

NGA-TRAN · 2025-09-05T13:37:02Z

src/distributed_physical_optimizer_rule.rs

    ) -> Result<Arc<dyn ExecutionPlan>> {
        // We can only optimize plans that are not already distributed
-        if plan.as_any().is::<ExecutionStage>() {
+        if plan.as_any().is::<StageExec>() {


StageExec makes it more like DF style 👍

Yeah, it seems like an unwritten rule to suffix all ExecutionPlan implementations with *Exec

NGA-TRAN · 2025-09-05T13:49:53Z

src/flight_service/do_get.rs

+    // Helper to create a mock physical plan
+    fn create_mock_physical_plan(partitions: usize) -> Arc<dyn ExecutionPlan> {
+        let node = Arc::new(EmptyExec::new(SchemaRef::new(Schema::empty())));
+        Arc::new(RepartitionExec::try_new(node, Partitioning::RoundRobinBatch(partitions)).unwrap())


This isn’t directly related to this PR, but might be something we consider updating or adding in a follow-up.

This RoundRobinBatch got me thinking 🤔. On one hand, it's a valid scenario we want our tests to cover. On the other, round-robin repartitioning isn’t ideal from a performance standpoint. How about we introduce a variety of partitioning schemes in our mock tests? That way, we can surface more realistic and appropriate cases while still including edge scenarios like this one.

Yeah, we should do something with RoundRobin repartitions...

We probably should extend our parquet files test datasets so that we can cover more cases and more esoteric distribution patterns.

For example, it would be nice to have two test parquet files with related data that we can use to stress joins in weird ways.

Probably something to follow up soon

gabotechs force-pushed the gabrielmusat/bigger-tpch-tests branch 2 times, most recently from 898fbf0 to c01c11c Compare September 5, 2025 06:03

Base automatically changed from gabrielmusat/bigger-tpch-tests to main September 5, 2025 06:10

gabotechs added 5 commits September 5, 2025 08:11

move plans/ to execution_plans/

e3096a6

Create protobuf/ folder

f626737

Move stage along with the other execution plan implementations

3985f3d

Rename channel manager to channel resolver

39ff6e5

Rename physical_optimizer.rs to distributed_physical_optimizer_rule.rs

355da32

gabotechs force-pushed the gabrielmusat/cleanup branch from 8d7ba36 to 355da32 Compare September 5, 2025 06:11

NGA-TRAN approved these changes Sep 5, 2025

View reviewed changes

gabotechs merged commit b728287 into main Sep 5, 2025
4 checks passed

gabotechs deleted the gabrielmusat/cleanup branch September 5, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

File name and folder restructure #124

File name and folder restructure #124

Uh oh!

gabotechs commented Sep 4, 2025

Uh oh!

NGA-TRAN left a comment

Uh oh!

NGA-TRAN Sep 5, 2025

Uh oh!

gabotechs Sep 5, 2025

Uh oh!

NGA-TRAN Sep 5, 2025

Uh oh!

gabotechs Sep 5, 2025

Uh oh!

NGA-TRAN Sep 5, 2025

Uh oh!

gabotechs Sep 5, 2025

Uh oh!

gabotechs Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

File name and folder restructure #124

File name and folder restructure #124

Uh oh!

Conversation

gabotechs commented Sep 4, 2025

Uh oh!

NGA-TRAN left a comment

Choose a reason for hiding this comment

Uh oh!

NGA-TRAN Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gabotechs Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

NGA-TRAN Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gabotechs Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

NGA-TRAN Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gabotechs Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gabotechs Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants